Skip to content

GH-418: Reduce arrow-vector dependencies: drop jackson-* (other than jackson-core) and commons-codec#1181

Draft
JonathanGiles wants to merge 1 commit into
apache:mainfrom
JonathanGiles:reduce-arrow-vector-dependencies
Draft

GH-418: Reduce arrow-vector dependencies: drop jackson-* (other than jackson-core) and commons-codec#1181
JonathanGiles wants to merge 1 commit into
apache:mainfrom
JonathanGiles:reduce-arrow-vector-dependencies

Conversation

@JonathanGiles

Copy link
Copy Markdown

In considering using this API, I was concerned by the heaviness of the dependencies. I set about looking into the feasibility of removing these. This PR is not meant to be a final complete solution, but as a starting point for a discussion around the appetite of reducing the dependency size to make the library more palatable for developers building libraries (as I am with the Azure SDKs for Java).

AI Disclosure: I built this on my machine with the help of coding agents (Claude Opus 4.8) using the GitHub Copilot CLI tooling.

What's Changed

Removes several heavy dependencies from arrow-vector by migrating its JSON handling to the lightweight jackson-core streaming API and the JDK's built-in HexFormat.

Dependencies dropped from arrow-vector:

  • com.fasterxml.jackson.core:jackson-databind
  • com.fasterxml.jackson.core:jackson-annotations
  • com.fasterxml.jackson.datatype:jackson-datatype-jsr310
  • commons-codec:commons-codec

jackson-core is retained (used for streaming JSON read/write).

How:

  • Schema, Field, DictionaryEncoding, and ArrowType (+ generated subtypes) JSON (de)serialization rewritten from ObjectMapper + jackson annotations (@JsonCreator / @JsonProperty / @JsonTypeInfo) to jackson-core streaming (JsonGenerator / JsonParser), with two small internal helpers: JsonValues (parser→tree + typed extractors) and JsonStringSerializer (compact toString() JSON).
  • extension/OpaqueType and the IPC JsonFileReader / JsonFileWriter migrated to the streaming API.
  • Hex encoding/decoding in the IPC JSON files switched from commons-codec Hex to java.util.HexFormat.
  • Text no longer carries a jackson @JsonSerialize annotation (and its inner TextSerializer is removed).

Compatibility preserved deliberately:

  • JsonStringHashMap / JsonStringArrayList had a static ObjectMapper field removed, which would have changed their implicitly-computed serialVersionUID and broken Java deserialization of objects written by older Arrow versions (e.g. blobs stored by H2, objects serialized by Spark). The original serialVersionUID values are now pinned explicitly to retain wire compatibility.
  • These collections' toString() can contain java.time values (from temporal vectors' getObject()). The previous output used JavaTimeModule's numeric form (e.g. LocalDateTime[2021,1,2,3,4,5], Duration90.000000000). That exact output is reproduced in JsonStringSerializer, so toString() is byte-for-byte unchanged. A clearly-marked code block documents how to revert to native ISO-8601 output if desired.

This contains breaking changes.

Breaking changes

# Change Impact / migration
1 Removed public class org.apache.arrow.vector.util.ObjectMapperFactory External callers should use their own ObjectMapper (add jackson-databind directly). It only configured JavaTimeModule; supply that module if you need java.time support.
2 Removed public inner class Text.TextSerializer (and Text's @JsonSerialize) If you serialized Text via an external jackson ObjectMapper, register a custom serializer.
3 Jackson annotations removed from Schema / Field / DictionaryEncoding / ArrowType Code that (de)serialized these POJOs with an external ObjectMapper relying on Arrow's annotations will no longer work; use Schema.fromJSON(String) / Schema.toJson() (public API, unchanged signatures) instead.
4 Transitive dependencies removed jackson-databind, jackson-annotations, jackson-datatype-jsr310, and commons-codec are no longer pulled in transitively via arrow-vector. Downstreams that relied on this transitively must declare them directly.
5 module-info requires reduced arrow-vector's module no longer requires com.fasterxml.jackson.databind, …annotation, …jsr310, or org.apache.commons.codec.

Non-breaking behavioral notes

  • toString() of complex vectors containing java.time values is unchanged (legacy numeric format reproduced); serialVersionUID of JsonStringHashMap / JsonStringArrayList is preserved.
  • No change to the Arrow IPC binary format or the integration-test JSON wire format (JSON object key ordering is not significant and is unaffected; JsonFileWriter emits raw epoch numbers as before).

Testing

  • arrow-vector: full suite 1131 tests pass.
  • arrow-tools: 13 pass (exercises the Arrow↔JSON file round-trip).
  • adapter/jdbc: 152 pass (exercises JsonStringHashMap serialization + H2 blob deserialization).
  • spotless:check + checkstyle:check clean.

Closes #418.

… jsr310 and commons-codec

Replace jackson-databind/annotations/jackson-datatype-jsr310 usage in
arrow-vector with the jackson-core streaming API (JsonGenerator/JsonParser)
plus small hand-written helpers (JsonValues, JsonStringSerializer), and
replace commons-codec hex handling with java.util.HexFormat.

- Schema/Field/DictionaryEncoding/ArrowType JSON (de)serialization now uses
  jackson-core streaming instead of ObjectMapper + jackson annotations.
- OpaqueType and the IPC JsonFileReader/JsonFileWriter migrated to streaming.
- JsonStringHashMap/JsonStringArrayList keep their JSON toString() via
  JsonStringSerializer; explicit serialVersionUID values are pinned to
  preserve Java deserialization compatibility, and JavaTimeModule's exact
  numeric output for java.time values is reproduced for byte-for-byte
  toString() compatibility.
- Removed the public ObjectMapperFactory utility (callers updated) and the
  Text jackson @JsonSerialize annotation.

This contains breaking changes (removed public ObjectMapperFactory and
Text.TextSerializer, removed jackson annotation/databind interop, dropped
transitive databind/annotations/jsr310/commons-codec dependencies, and
module-info requires changes).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JonathanGiles JonathanGiles changed the title Reduce arrow-vector dependencies: drop jackson-* (other than jackson-core) and commons-codec GH-418: Reduce arrow-vector dependencies: drop jackson-* (other than jackson-core) and commons-codec Jun 11, 2026
@github-actions

This comment has been minimized.

@JonathanGiles

Copy link
Copy Markdown
Author

I don’t appear to have permission to add labels on this fork-based PR.

Could a maintainer please add the enhancement label? Thanks

@lidavidm lidavidm added the enhancement PRs that add or improve features. label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change enhancement PRs that add or improve features.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Java] Remove Jackson from compile-time dependencies for arrow-vector

2 participants